ORDERED_LOGIT
Overview
The ORDERED_LOGIT function fits an ordered logistic regression model (also known as the proportional odds model) for ordinal dependent variables. This type of regression is appropriate when the outcome has naturally ordered categories—such as survey responses ranging from “strongly disagree” to “strongly agree,” bond ratings, or health status levels—where the ordering matters but the intervals between categories are not assumed to be equal.
Ordered logit is based on a latent variable framework. The model assumes an unobserved continuous variable y^* underlies the observed categorical responses:
y^* = X\beta + \varepsilon
where X represents the predictor variables, \beta are the regression coefficients, and \varepsilon follows a standard logistic distribution. The observed ordinal outcome y is determined by where y^* falls relative to a set of cut points (thresholds) \mu_1, \mu_2, \ldots, \mu_{K-1} for K categories:
y = k \quad \text{if} \quad \mu_{k-1} < y^* \leq \mu_k
The probability of observing category k is:
P(y = k | X) = F(\mu_k - X\beta) - F(\mu_{k-1} - X\beta)
where F is the cumulative distribution function of the logistic distribution.
This implementation uses the OrderedModel class from the statsmodels library. The function returns coefficient estimates for each predictor along with cut points that separate the ordered categories, standard errors, z-statistics, p-values, and confidence intervals. Model fit statistics include the pseudo R-squared, log-likelihood, AIC, and BIC.
Common applications include analyzing Likert-scale survey data, credit ratings, educational attainment levels, and any scenario where outcomes fall into ranked categories. For theoretical background, see the Wikipedia article on ordered logit and the original work by McCullagh (1980).
This example function is provided as-is without any representation of accuracy.
Excel Usage
=ORDERED_LOGIT(y, x, fit_intercept, alpha)
y(list[list], required): Ordinal dependent variable as a column vector with integer category values (0, 1, 2, …) representing ordered categories.x(list[list], required): Independent variables (predictors) as a matrix where each column represents a different predictor variable.fit_intercept(bool, optional, default: true): Reserved for API consistency; has no effect since ordered models use cut points instead of intercepts.alpha(float, optional, default: 0.05): Significance level for confidence intervals, between 0 and 1.
Returns (list[list]): 2D list with ordered logit results, or error string.
Examples
Example 1: Basic three-category model with one predictor
Inputs:
| y | x |
|---|---|
| 0 | 1 |
| 0 | 1.2 |
| 0 | 1.4 |
| 0 | 1.6 |
| 0 | 1.8 |
| 0 | 2 |
| 1 | 1.8 |
| 1 | 2 |
| 1 | 2.2 |
| 1 | 2.4 |
| 1 | 2.6 |
| 1 | 2.8 |
| 1 | 3 |
| 1 | 3.2 |
| 2 | 2.8 |
| 2 | 3 |
| 2 | 3.2 |
| 2 | 3.4 |
| 2 | 3.6 |
| 2 | 3.8 |
Excel formula:
=ORDERED_LOGIT({0;0;0;0;0;0;1;1;1;1;1;1;1;1;2;2;2;2;2;2}, {1;1.2;1.4;1.6;1.8;2;1.8;2;2.2;2.4;2.6;2.8;3;3.2;2.8;3;3.2;3.4;3.6;3.8})
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| cut_0/1 | 6.099 | 2.441 | 2.498 | 0.01248 | 1.314 | 10.88 |
| cut_1/2 | 11.58 | 4.742 | 2.442 | 0.01462 | 2.284 | 20.87 |
| x0 | 1.906 | 0.4324 | 4.408 | 0.00001043 | 1.059 | 2.753 |
| pseudo_r_squared | 0.6101 | |||||
| log_likelihood | -8.49 | |||||
| aic | 22.98 | |||||
| bic | 25.97 |
Example 2: Model without intercept using same data
Inputs:
| y | x | fit_intercept |
|---|---|---|
| 0 | 1 | false |
| 0 | 1.2 | |
| 0 | 1.4 | |
| 0 | 1.6 | |
| 0 | 1.8 | |
| 0 | 2 | |
| 1 | 1.8 | |
| 1 | 2 | |
| 1 | 2.2 | |
| 1 | 2.4 | |
| 1 | 2.6 | |
| 1 | 2.8 | |
| 1 | 3 | |
| 1 | 3.2 | |
| 2 | 2.8 | |
| 2 | 3 | |
| 2 | 3.2 | |
| 2 | 3.4 | |
| 2 | 3.6 | |
| 2 | 3.8 |
Excel formula:
=ORDERED_LOGIT({0;0;0;0;0;0;1;1;1;1;1;1;1;1;2;2;2;2;2;2}, {1;1.2;1.4;1.6;1.8;2;1.8;2;2.2;2.4;2.6;2.8;3;3.2;2.8;3;3.2;3.4;3.6;3.8}, FALSE)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| cut_0/1 | 6.099 | 2.441 | 2.498 | 0.01248 | 1.314 | 10.88 |
| cut_1/2 | 11.58 | 4.742 | 2.442 | 0.01462 | 2.284 | 20.87 |
| x0 | 1.906 | 0.4324 | 4.408 | 0.00001043 | 1.059 | 2.753 |
| pseudo_r_squared | 0.6101 | |||||
| log_likelihood | -8.49 | |||||
| aic | 22.98 | |||||
| bic | 25.97 |
Example 3: Custom significance level (90% CI)
Inputs:
| y | x | alpha |
|---|---|---|
| 0 | 1 | 0.1 |
| 0 | 1.2 | |
| 0 | 1.4 | |
| 0 | 1.6 | |
| 0 | 1.8 | |
| 0 | 2 | |
| 1 | 1.8 | |
| 1 | 2 | |
| 1 | 2.2 | |
| 1 | 2.4 | |
| 1 | 2.6 | |
| 1 | 2.8 | |
| 1 | 3 | |
| 1 | 3.2 | |
| 2 | 2.8 | |
| 2 | 3 | |
| 2 | 3.2 | |
| 2 | 3.4 | |
| 2 | 3.6 | |
| 2 | 3.8 |
Excel formula:
=ORDERED_LOGIT({0;0;0;0;0;0;1;1;1;1;1;1;1;1;2;2;2;2;2;2}, {1;1.2;1.4;1.6;1.8;2;1.8;2;2.2;2.4;2.6;2.8;3;3.2;2.8;3;3.2;3.4;3.6;3.8}, 0.1)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| cut_0/1 | 6.099 | 2.441 | 2.498 | 0.01248 | 2.084 | 10.11 |
| cut_1/2 | 11.58 | 4.742 | 2.442 | 0.01462 | 3.781 | 19.37 |
| x0 | 1.906 | 0.4324 | 4.408 | 0.00001043 | 1.194 | 2.617 |
| pseudo_r_squared | 0.6101 | |||||
| log_likelihood | -8.49 | |||||
| aic | 22.98 | |||||
| bic | 25.97 |
Example 4: Multiple predictors with all arguments specified
Inputs:
| y | x | fit_intercept | alpha | |
|---|---|---|---|---|
| 0 | 1 | 1 | true | 0.05 |
| 0 | 1.2 | 0.9 | ||
| 0 | 1.4 | 1.1 | ||
| 0 | 1.6 | 0.8 | ||
| 0 | 1.8 | 1.2 | ||
| 0 | 2 | 0.7 | ||
| 1 | 1.8 | 1.3 | ||
| 1 | 2 | 1.4 | ||
| 1 | 2.2 | 0.9 | ||
| 1 | 2.4 | 1.5 | ||
| 1 | 2.6 | 1 | ||
| 1 | 2.8 | 1.6 | ||
| 1 | 3 | 1.1 | ||
| 1 | 3.2 | 1.7 | ||
| 2 | 2.8 | 1.8 | ||
| 2 | 3 | 1.2 | ||
| 2 | 3.2 | 1.9 | ||
| 2 | 3.4 | 1.3 | ||
| 2 | 3.6 | 2 | ||
| 2 | 3.8 | 1.4 |
Excel formula:
=ORDERED_LOGIT({0;0;0;0;0;0;1;1;1;1;1;1;1;1;2;2;2;2;2;2}, {1,1;1.2,0.9;1.4,1.1;1.6,0.8;1.8,1.2;2,0.7;1.8,1.3;2,1.4;2.2,0.9;2.4,1.5;2.6,1;2.8,1.6;3,1.1;3.2,1.7;2.8,1.8;3,1.2;3.2,1.9;3.4,1.3;3.6,2;3.8,1.4}, TRUE, 0.05)
Expected output:
| parameter | coefficient | std_error | z_statistic | p_value | ci_lower | ci_upper |
|---|---|---|---|---|---|---|
| cut_0/1 | 6.216 | 2.927 | 2.124 | 0.03369 | 0.4794 | 11.95 |
| cut_1/2 | 3.659 | 2.555 | 1.432 | 0.1521 | -1.349 | 8.667 |
| x0 | 15.78 | 6.85 | 2.304 | 0.02124 | 2.355 | 29.21 |
| x1 | 2.111 | 0.4512 | 4.68 | 0.000002872 | 1.227 | 2.996 |
| pseudo_r_squared | 0.6677 | |||||
| log_likelihood | -7.236 | |||||
| aic | 22.47 | |||||
| bic | 26.46 |
Python Code
import math
import numpy as np
from statsmodels.miscmodels.ordinal_model import OrderedModel as statsmodels_ordered_model
def ordered_logit(y, x, fit_intercept=True, alpha=0.05):
"""
Fits an ordered logistic regression model for ordinal outcomes.
See: https://www.statsmodels.org/stable/generated/statsmodels.miscmodels.ordinal_model.OrderedModel.html
This example function is provided as-is without any representation of accuracy.
Args:
y (list[list]): Ordinal dependent variable as a column vector with integer category values (0, 1, 2, ...) representing ordered categories.
x (list[list]): Independent variables (predictors) as a matrix where each column represents a different predictor variable.
fit_intercept (bool, optional): Reserved for API consistency; has no effect since ordered models use cut points instead of intercepts. Default is True.
alpha (float, optional): Significance level for confidence intervals, between 0 and 1. Default is 0.05.
Returns:
list[list]: 2D list with ordered logit results, or error string.
"""
def to2d(val):
return [[val]] if not isinstance(val, list) else val
def validate_numeric(val, name):
if not isinstance(val, (int, float)):
return f"Invalid input: {name} must be a number."
if math.isnan(val) or math.isinf(val):
return f"Invalid input: {name} must be finite."
return None
# Normalize inputs
y = to2d(y)
x = to2d(x)
# Validate y is a column vector
if not isinstance(y, list) or len(y) == 0:
return "Invalid input: y must be a non-empty 2D list."
if not all(isinstance(row, list) and len(row) == 1 for row in y):
return "Invalid input: y must be a column vector (2D list with one column)."
# Validate x is a matrix
if not isinstance(x, list) or len(x) == 0:
return "Invalid input: x must be a non-empty 2D list."
if not all(isinstance(row, list) for row in x):
return "Invalid input: x must be a 2D list."
num_rows_x = len(x)
num_cols_x = len(x[0]) if num_rows_x > 0 else 0
if num_cols_x == 0:
return "Invalid input: x must have at least one column."
if not all(len(row) == num_cols_x for row in x):
return "Invalid input: x must have consistent row lengths."
# Check y and x have same number of rows
if len(y) != num_rows_x:
return "Invalid input: y and x must have the same number of rows."
# Validate fit_intercept
if not isinstance(fit_intercept, bool):
return "Invalid input: fit_intercept must be a boolean."
# Validate alpha
err = validate_numeric(alpha, "alpha")
if err:
return err
if alpha <= 0 or alpha >= 1:
return "Invalid input: alpha must be between 0 and 1."
# Extract y values
y_flat = []
for row in y:
val = row[0]
err = validate_numeric(val, "y value")
if err:
return err
y_flat.append(val)
# Check y values are integers
for val in y_flat:
if val != int(val):
return "Invalid input: y must contain integer category values."
# Extract x values
x_matrix = []
for row in x:
x_row = []
for val in row:
err = validate_numeric(val, "x value")
if err:
return err
x_row.append(float(val))
x_matrix.append(x_row)
# Convert to numpy arrays
y_array = np.array(y_flat)
x_array = np.array(x_matrix)
# Set parameter names
param_names = [f"x{i}" for i in range(num_cols_x)]
# Fit the ordered logit model
# Note: OrderedModel uses cut points (thresholds) instead of traditional intercepts.
# The cut points are always estimated and capture what would be the intercept.
# The fit_intercept parameter is kept for API consistency but has no effect.
try:
model = statsmodels_ordered_model(y_array, x_array, distr='logit')
result = model.fit(disp=0, method='bfgs')
except Exception as exc: # noqa: BLE001
return f"Model fitting error: {exc}"
# Extract results
output = [["parameter", "coefficient", "std_error", "z_statistic", "p_value", "ci_lower", "ci_upper"]]
# Get confidence intervals
try:
conf_int = result.conf_int(alpha=alpha)
except Exception as exc: # noqa: BLE001
return f"Confidence interval error: {exc}"
# Extract cut points (thresholds)
params = result.params
std_errors = result.bse
z_stats = result.tvalues
p_values = result.pvalues
# Determine number of categories
n_categories = len(set(y_flat))
n_thresholds = n_categories - 1
# Add threshold parameters
for i in range(n_thresholds):
param_name = f"cut_{i}/{i+1}"
output.append([
param_name,
float(params[i]),
float(std_errors[i]),
float(z_stats[i]),
float(p_values[i]),
float(conf_int[i, 0]),
float(conf_int[i, 1])
])
# Add predictor parameters
for i in range(n_thresholds, len(params)):
param_idx = i - n_thresholds
param_name = param_names[param_idx]
output.append([
param_name,
float(params[i]),
float(std_errors[i]),
float(z_stats[i]),
float(p_values[i]),
float(conf_int[i, 0]),
float(conf_int[i, 1])
])
# Add model statistics
output.append(["pseudo_r_squared", float(result.prsquared), "", "", "", "", ""])
output.append(["log_likelihood", float(result.llf), "", "", "", "", ""])
output.append(["aic", float(result.aic), "", "", "", "", ""])
output.append(["bic", float(result.bic), "", "", "", "", ""])
return output